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ABSTRACT 



The development of a single value revision indicator 
which would utilize learner performance data obtained from a 
pretest- post test design to rank a set of instructional modules as to 
their relative need for revision is discussed, A set of procedures 
was developed in connection with the implementation of the 
Production, Implementation, Evaluation, and Revision of Instruction 
Modules (PIER IM ) Model for design of instruction, A comparison of the 
similarities and differences between using the module in a 
conventional classroom environment and using it in a self-instruction 
environment are presented as a frame of reference for the analysis 
and interpretation of the learner performance data reported in Tables 
1 and 2, The correlation coefficient (r -.83) indicated substantial 
agreement between the rankings of the instructional modules using the 
revision indicators derived from the learner performance data for 
Group I and Group II, This methodology appears to be one method for 
the better utilization of data derived from learner performance 
during the formative evaluation of instructional materials, (CK) 
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ANALYSIS OF PERFORMANCE DATA FOR INSTRUCTIONAL DESIGN PROJECTS 

by Gary Lipe 



I. PROBLEM 

A selected review of the research related to the areas 
of criterion-referenced measures and evaluation provide a back- 
ground for the discussion of the development of a Revision In- 
dicator to be used in connection with the formative evaluation 
of instructional modules. 



Criterion-Referenced Measures 

Glaser (1963; 1967) , Glaser and Cox (1968) , and Popham 
and Husek (1969) discussed "not only the similarities and differences 
between norm-referenced measures and criterion-referenced measures, 
but also the application of criterion-referenced measures to 
evaluation of instruction. A criterion-referenced test was oper- 
ationally defined to include any measure which: 

1. Assesses learner performance in relation to a 
predetermined standard of performance.' 

2. Provides information as to the level of per- 
formance by each learner which is independent 
of reference to the performance of other 
learners (after Glaser, 1968 and Glaser and 
Cox, 1968) . 

Ebel (1962) discussed ten principles which should be 

*1 

considered when tests of educational achievement were being pre- 
pared and used. The first five principles were considered to be 
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equally applicable to criterion-referenced measures: 

1. The measurement of educational achievement is 
essential to effective education. 

2. An educational test is no more or less than a 
device for facilitating, extending, and refining 
a teacher's observation of student achievement. 

3. Every important outcome of education can be 
measured. 

4. The most important educational achievement is 
command of useful knowledge. 

5. Written tests are well suited to measure the 
student's command of useful knowledge (p. 20-22). 



Evaluation 

Definition 



Merwin (1969) reviewed the historical development and 
changing concept of evaluation and concluded that "concepts of 

t 

evaluation have developed in response to needs for evaluational 
practices . . . (p. 25)." The combination of ideas from Stake's 

(1967) discussion of curriculum evaluation, Scriven's (1967) 
discussion of formative evaluation and Wittrock's (1969) dis- 

I 

cussion of evaluation of instruction resulted in the following 

definition: > 

Formative evaluation is the collection, processing, 
and interpretation of data for the purpose of des- 
cribing and making judgement as to the quality and 
appropriateness of behavioral objectives, instructional 
materials, environments, and learner performance, 
and utilizing the results to make decisions concerning 
the modification of the instructional system from 
which the data was derived. 

Modification of a system based on data derived from the 
system (e . g. , output) implies feedback. Feedback has generally 
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been defined as any output of a system which either directly or 
indirectly serves as future input to the system. Within the 
context of a system model for design of instruction, the role of the 
evaluator is to utilize the output of the system to identify 
possible weaknesses within the system which , if corrected, would 
increase the efficiency of the total system and/or proportion of 
learners attaining the specified standard of performance. Feed- 
back to the instructor provides the information required to make 

decisions concerning the modification of instructional materials 

•» 

and/or procedures (Bloom, 1968 , 1969; Cronback, 1963; Glaser, 1965; 

Tyler, 1949, 1951; Wittrock , 1969). The information can also be 
used to modify the product of any of the steps in a system model 
for design of instruction (Briggs, 1970; Dick, 1969). 

There are few specific guidelines concerning the data 
to be collected, techniques for analyzing the data, or decision 
strategies for assigning priorities to the changes which must be 
made to an instructional system. Recommendations are reviewed 
for test items and instructional materials. 

Test Items . 

System models for design of instruction and mastery 
models each identify the first concern in evaluation test items, 

which is to establish the content validity of the item (Bloom, 

\ 

1968, Cronbach, 1963; Ebel, 1956; Husek, 1969; Popham & Husek, 

1969; Tyler, 1949; Wittrock, 1969). . When test items are derived 
directly from statements of behavioral objectives, as they are 
in a system model for design of instruction, the content validity 

of the item has been established. 

O 
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Empirical testing of test items, using both individual 
and small group procedures, has been recommended by Tyler (1949). 
The method of scoring the performance of a learner should be 
made as objective as possible (Bloom, 1969; Lindvall & Cox, 1969; 
Tyler, 1949; Wittrock, 1969) , and the basis of scoring should 
be made known to the learner (Wittrock, 1969) . E veins (1968) 
recommended the use of multiple-choice type items whenever possible 
and contended that the ultimate operational definition of the 
instructional system's objectives is the posttest used to eval- 
uate the learner’s performance. 

Cox and Vargas (1966), Glaser .and Cox (1968), Hills 

( 

(1970), Husek (1969), Moxley (1970), Popham (1970), and Popham 
and Husek (1969) have all expressed concern because of the lack 
of appropriate methods of analyzing data from criterion- referenced 
measures of learner performance. The suggested recommendations 
have been very general in nature, such as: the proportion of 

learners passing an item should be low on the pretest and high, 
on the posttest (Glaser & Cox, 1968; Moxley, 1970) , and a negative 
discriminator in an item pool should be carefully analyzed (Popham 
& Husek, 1969). Specific procedures for item analysis, based 
on the pretest-posttest design, have been discussed by Cox and 
Vargas (1966) (e.g. , pretest-posttest difference index) and Popham 

(1970) (e.g., fourfold analysis of pretest-posttes t learning 

states) . 

Instructional Material 

The pretest-posttes t design has been widely recommended 
and is essentia‘1 if learning is to be inferred from changes in the 
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learner’s performance before and after interacting with an 
instructional system (Deterline, 1957; Glaser & Cox, 1968; Lindvall 
& Cox, 1969; Lumsdaine, 1965; Provus, 1969; Tyler, 1949; Wittrock, 
1969) . The pretest-posttest design is considered a minimal design 
by Tyler (1949) and additional observations of the learner's 
performance were recommended to estimate the retention of the 
performance. When the only data available to an evaluator is 
from a pretest-posttest design, it is exceedingly difficult to 
determine which element of the instructional system should be 
revised. 

The problem was to develop a single value Revision 
Indicator which would utlize learner performance data obtained 
from a pretest-posttest design to rank a set of instructions.1 
modules as to their relative need for revision. 

2. PROCEDURES 

The following set of procedures were developed in 
connection with the implementation of the Production, Implementation, 
Evaluation, and Revision of Instructional Modules (PIERIM) model 
for design of instruction (Lipe, 1970) . A comparison of the simi- 
larities and differences which existed during Phase 2 — Implementation 
and Evaluation of Instructional Module in a Conventional Classroom 
Environment and Phase 4 — Implementation and Evaluation of Instruct- 
ional Modules in a Self-Instruction Environment of the PIERIM 
model provides a frame of reference for the analysis and inter- 
pretation of the learner performance data reported in Tables 1 and 
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INSTRUCTIONAL MODULES NUMBER MEAN STANDARD DEVIATION 

ITEMS PRETEST POSTTEST PRETEST POSTTEST 
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Ta~ie 1. — Learner performance data — Phase 



Instructional Module " Number Mean Standard Deviation . 

Items Pretest Posttest Ppetest Posttest 
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The similarities which existed between the two imple- 
mentations of the instructional modules included: 

1. Course--Tne evaluation unit of EED 405 — Classroom 
Organization and Pupil Evaluation was used to implement the 
instructional modules. 

2. Instructor— The same graduate assistant instructor 
was given complete responsibility for the evaluation unit. 

3. Population — The learners were all elementary 
education majors in either their junior or senior year at The 
Florida State University. 

4. Length of Unit — The evaluation unit was allocated 
a total of nine one-hour class sessions. 

The significant differences between the two implementations 
of the instructional modules are: 

1. Test Items — A set of 42 multiple choice test items 
was used to measure the learners' performance on the 16 instruct- 
ional modules which specified multiple choice items as the method 

J 

of evaluation. There were 3 test items replaced and 11 test 

items modified during the revision of the instructional materials. 

» 

2. Testing Procedures — The time between the pre- and 
posttest was reduced from 16 calendar days during Phase 2 to 8 
calendar days during Phase 4. 

3. Sample Size — Nineteen learners participated in Phase 
2 and 28 learners participated in Phase 4 of the PIERIM model. 

Interpretation of Learner Performance 

The learners' performance can be expected to deviate 
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from the performance predicted by criterion- referenced measurement 
and mastery models of learning to the extent that the following 
assumptions , implicit in the procedures used to design and/or 
implement the instructional modules and tests, are violated: 

1. Learners enter the instructional system in an 
unlearned state. 

2. Learners, who interact with the instructional 
resources specified, change from an unlearned 
to a learned state. 

- 3. Learners possess any prerequisite competencies 
required to interact with the instructional re- 
sources that are identified for the instructional 
modules. 

4. Learners have sufficient time to achieve mastery 
on each instructional module. 

5. Test items, for each instructional module, repre- 
. sented a homogeneous sample of the performance 

described by the behavioral objective. 

The learners' performance was measured for the set of 

16 instructional modules using the same form of a 42 item 

multiple choice test as both the pre- and posttest in a One Group 

Pretest-Posttest Design. Revisions were made to the test during 

Phase 3 of the PIERIM model and this factor should be considered 

when comparing the performance of Group 1 (i.e., Conventional 

> 

Classroom Group) and Group 2 (i.e. , Self-Instruction Group) . The 
sample size for Group 1 and Group 2 were 19 and 28 learners res- 
pectively. 



Violation of Statistical Assumptions 

The interpretation of learner performance data is further 
complicated by the use of intact classroom groups to study the 



O 



effects of the instructional materials and/or procedures on the 
learners' performance. The use of intact classroom groups vio- 
lates one of the basic underlying assumptions of inferential stat- 
istics (i.e., random sampling of learners from the population). 

The assumption that the underlying distribution of the trait 
being evaluated approximates the normal distribution is violated 
as the actual effectiveness of the instructional materials and/or 
procedures approach their theoretical limit of 100 percent effect- 
iveness. Non-parametric statistics were selected for analysis 
of the learner performance data. Non-parametric statistics 
(i.e., phi coefficients and McNemar's Test) were selected to be 
reported by the Instructional Support System (ISS) , computer pro- 
gram STAT because there are no assumptions required concerning the 
underlying distribution of the performance data. 

The purpose of designing and implementing the instruct- 
ional modules in a self-instruction environment was for the 
learners to achieve at least the standard of performance specified 
for each of the instructional modules. Learning is -inferred from 
gains in the proportion of learners achieving the standard of 
performance from pretest to posttest. It is important to remember 
that the research design utilized (i.e.. One Group Pretest- 
Posttest Design) makes it impossible to separate the gains attri- 
butable to the effects of testing from the gains attributable to 
the instructional treatment. Utilizing the proportion of learners 
achieving at least the standard of performance on the pretest and 
posttest the gains from pretest to posttest and the ratio of the 
gains to potential gain are reported for each instructional 
module (see Table 3) . 



Proportions 

Instructional Module Pre- Post Ratio 

Test Test Gain 1 Gain 2 Gain 1/Gain 2 
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Any arbitrary standard can be selected as the performance 
standard for a system model for design of instruction. For 
purposes of illustrating the use of a standard of performance for 
a system model for design of instruction, 70 percent is selected 
as the system standard for the PIERIM model. The learners 
achieved the system standard of performance on four of the 16 
instructional modules on the pretest and for 10 of the 16 instruct- 
ional modules on the posttest (see Table 3) . There would be 
reason to suspect that for at least the four instructional mo- 
dules on which the system standard of 70 percent was achieved on 
the pretest that the topic had been previously taught in other 
education courses or the instructional objective was so obvious 
as not to require instruction. A comparison of the ratios of 
gains to potential gains requires the assumption that a gain from 
.800 to .900 (i.e., .10/. 20 = .500) is equivalent to a gain of 
from .400 to .700 (i. e. , .30/. 60 = .500). 

Revision Indicator for Instructional Modules 

When the instructor of the elementary education course 
reviewed the set of summary reports produced by the ISS program 
STAT, he reported that the volume of information contained in the 
reports was overwhelming. It was determined that a single rank 
indicator for each instructional module would be an asset to the 
instructor and educational technologist by directing their efforts 
during the revision of the instructional modules. Neither the 
summary reports produced by the computer programs nor the Revision 
Indicator have actually been utilized to support Phase 3 of the 

PIERIM model. 
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The rationale for the Revision Indicator was to select 
a number of statistics, which were available to the instructor 
and educational technologist, and predict the direction in which 
each each statistic would be expected to change on the basis of 
criterion-referenced measurement and/or mastery models of learn- 
ing. The Revision Indicator is a single composite value derived 
from the following statistics: 

1. Mean — The posttest mean is predicted to be greater 
than the pretest mean. The means for Group 1 and Group 2 (see 
Tables 1 and 2) indicate that the mean of each instructional 
module did in fact increase from pretest to posttest. 

2. Standard Deviation — The posttest standard deviation 
is predicted to be less than the pretest standard deviation. The 
standard deviations for Group 1 (see Table 1) and Group 2 (see 
Table 2) indicate that for some of the instructional modules the 
standard deviations changed in the opposite direction. 

3. Maximum Pretest Score — Learners who achieve a max- 
imum score on the pretest are predicted to achieve mastery on the 

posttest. . J., 

> 

4. Posttest Scores of Zero — Less than 5% of the learners 
are predicted to be in an unlearned state on each of the items 
related to an instructional module.- 

5. Phi Coefficients — Each of the inter item phi coeffi- 
cients for a set of items related to an instructional module 

are predicted to be positive. The total number of negative phi 
coefficients is calculated for the set of items for each instructional 
module. 
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6. Proportion of Correct Answers — The proportion of 

learners who answered an item correctly on the posttest is pre- 
dicted to be greater than .50. ' 

7. Alternatives for Test Items — It is predicted that . 
on the pretest, at least one learner will select each alternative 
of the multiple choice items. 

8. Posttest Performance — When the group of learners 
are divided into upper and lower 50%, on the basis of total 
test score, at least 80% of the learners in the upper 50% are 
predicted to answer the item correctly. 

9. | Fail/Fail Category of Performance — The mean pro- 
portion of the learners in the fail/fail category of performance 
was calculated for Group 1 and Group 2 and each was found to 
approximate .25. The proportion of learners in the fail/fail 
category is predicted to be less than .25. 

Instructional modules and/or test items are categorized as posi- 
tive (+) if there is agreement between the observed and predicted 
direction of change. The instructional modules and/or test items 
are categorized as negative (-) if there is disagreement between 

I 

the observed and predicted direction of change. The negative 
indicators are totaled for each instructional module and the total 
is referred to as the Revision Indicator. 

3. RESULTS 

Using the performance data for Group 1 and Group 2, 
Revision Indicators were calculated for each instructional module 
(see Table 4) . There is substantial agreement between the rank- 
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Instructional Module 


GROUP 1 


GROUP 2 


Pretest/Posttest 


12 


10 


Behavioral Objectives 


3 


5 


Test Items 


11 


10 


Percentile Ranks 


10 


6 


Measures of Central Tendency 


13 


12 


Normal Distribution 


5 


8 


Normal Curve 


2 


1 


Correlation Coefficient 


3 


2 


Correlation/Scatter Diagram 


4 . 


5 


Validity 


5 


7 


Rel iabil ity/Factors Affecting 


7 


7 


Rel i abi 1 i ty/ Intarpretati on 


6 


6 


Standard Error of Measurement 


2 


3 


Types of Tests 


5 


3 


Test Norms/ Intell igence Quotient 


7 


- 7 


Standardized Test Information 


5 


7 









Numbers represent the total number of negative (-) » 

indicators for an instructional module 

Group 1 represents the 19 learners who participated in Phase 2 
Group 2 represents the 28 learners who participated in Phase 4 

Table 4. — Revision Indicators for instructional modules 
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ings of the instructional modules using tho Revision Indicators 
derived from the learner performance data for Group 1 and Group 
l (r § = .83). The same three instructional modules and related 
test items — Measures of Central Tendency, Pretest/Posttest, and 
Test Items — were identified as being in need for review and possible 
revision. The Pr^test/Pos ttest instructional module was the only 
one of the three instructional modules identified which had actually 
been revised during Phase 3 of the PIERIM model. 

4. IMPLICATIONS FOR FUTURE RESEARCH 

The preliminary work related to the development of the 
Revision Indicator provides one method of ranking instructional 
modules which are evaluated using criterion-referenced measures. 

The methodology appears to be one method of better utilizing data 
derived from learner performance during the formative evaluation 
of instructional materials. 

There is a need for the development of a simplified 
method of ranking instructional modules as to their relative need 
for revision and a rationale for terminating the revision pro- 
cess for an individual instructional module. The preliminary 
work related to the Revision Indicator could possibly be expanded 
to include subjective ratings by the instructor and/or the learners. 
Research related to the use of minimum change values in the cal- 
culation of the Revision Indicator rather than the simpler dichotomy 
which classifies observed changes as being either in a specified 
direction or in the opposite direction could possible improve 
the sensitivity of the Revision Indicator. 
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A rationale is needed tor selecting the criteria to 
be used to terminate the revision process for an instructional 
mo-'ile. Should the criteria be the same for instructional 
modules produced by a selection model and a design model? The 
criteria of available time and financial resources between 
successive implementations of the instructional modules must be 
considered when the design goals of an instructional system are 

established. 
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